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Abstract 

Gene duplication is believed to play a major role in the evolution of genomic 
complexity. The presence of a duplicate frees a gene from the constraint of 
natural selection, leading to its loss of function or the gain of a novel one. 
Alternately, a pleiotropic gene might partition its functions among its du- 
plicates, thus preserving both copies. Such arguments invoking duplication 
for novelty or specialisation are not true of diploid genotypes, but only of 
haplotypes. In this paper, we study the consequences of regulatory interac- 
tions in diploid genotypes and explore how the context of allelic interactions 
gives rise to dynamical phenotypes that enable duplicate genes to spread 
in a population. The regulatory network we study is that of a single au- 
toregulatory activator gene, and the two copies of the gene diverge either 
as alleles in a diploid species or as duplicates in haploids. These differences 
are in their transcriptional ability - either via alterations to its activating 
domain, or to its c«s-regulatory binding repertoire. When c^s-regulatory 
changes arc introduced that partition multiple regulatory triggers among 
the duplicates, it is shown that mutually exclusive expression states of the 
duplicates that emerge are accompanied by a back-up facility: when a highly 
expressed gene is deleted, the previously unexpressed duplicate copy com- 
pensates for it. The diploid version of the regulatory network model can 
account for allele-specific expression variants, and a model of inheritance of 
the haplotype network enables us to trace the evolutionary consequence of 
heterozygous phenotypes. This is modelled for the variations in the acti- 
vating domain of one copy, whereby stable as well as transiently bursting 
oscillations ensue in single cells. The evolutionary model shows that these 
phenotypic states accessible to a diploid, heterozygous genotype enable the 
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spread of a duplicated haplotype. 



1. Introduction 

Gene duplication is a major source of genomic expansion and is believed 
to underlie the evolution of complex biological functions [U [2]. Functions 
encoded by two copies of the same gene are redundant making the loss of 
function of a duplicate by mutation a likely outcome [3] . Hence, the observed 
abundance of duplicates in plant and animal genomes makes the retention 
of duplicate genes a much studied problem. The loss of selection pressure 
on the duplicate can be viewed not as a prerequisite for its elimination, 
but as an opportunity for it to become more abundant by acquiring a new 
and fitness-enhancing function [1], a process called neofunctionalization. If 
the duplicated gene already had multiple roles (is pleiotropic), their par- 
titioning among the duplicates would make each essential, a model called 
subfunctionalization Q . Appeals to variations amongst duplicate genes can, 
in diploid species, also be applied to allelic variations in singleton genes 
The loss of one or both allelic function has been the basis of debates on 
whether dominance in Mendelian inheritance is an evolutionary or physio- 
logical phenomenon [6]. Dominance and gene duplication have both been 
framed [7] as phenomena that involve gene dosage - the contribution of the 
number of functional genes to phenotype - and the functional redundancy 
and fitness of genes may involve quantitative factors. Quantitative consid- 
erations include the disruption of stoichiometrically balanced protein levels 
when gene duplication increases the expression of one interacting partner |8]. 
It is the network of interactions that mediate the causal pathways from genes 
to phenotype and consequent evolutionary outcomes. 

Novelties in evolution often emerge via changes to an organism's develop- 
ment. A common mechanism in developmental trajectories is the transfor- 
mation of transient stimuli into steady-state expression levels [101 (HI [12] 
by gene regulatory circuits that implement positive feedback, wherein a 
gene upregulates its own expression. Such an autoregulatory gene activator 
formed the basis of an experimental study [13| on the "reversal of subfunc- 
tionalization" in the pathway that governs brain-stem development. While 
different developmental triggers activate the paralogous hoxal, hoxbl gene 
pair in modern mice, they were replaced by a single autoregulatory activa- 
tor responsive to a common set of with cis-regulatory sites cis-regulatory 
inputs, and the resulting organism was viable [13\ . It is this circuit of a 
self-regulating activator, the smallest unit of positive feedback, that is the 
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object of our study. 

We study the consequences of duplication of this gene, both at the level 
of phenotype and in potential evolutionary outcomes, upon mutation. When 
mutations appear in this circuit as per the subfuntionalization model, intro- 
ducing complementary loss of function cis-regulatory mutations that confers 
tissue-specific expression patterns [H [U] , we find the emergence of a prop- 
erty of some developmental trajectories, namely redundancy of duplicates 
helps overcome mutational loss [15j. The mutation of a gene expressed in a 
tissue is compensated for by the upregulation of its unexpressed partner in 
this model, called transcriptional backup [16]. Mutations affecting the acti- 
vating sites of the transcription factor give rise to the onset of oscillations, 
both stable and bursty. Since the mathematical model for oscillations is also 
applicable to a diploid genotype with two alleles differing at their activating 
site, we examine the evolutionary fate of the duplicated gene haplotype in a 
diploid species via the fitness [12 [181 [7] of the oscillatory phenotype. Fitness 
values are indexed by parameter values in the doubled activator circuit that 
give rise to distinct qualitative dynamics. 

The dynamical system we study is that of a diploid model of the tran- 
scriptional network [19], which enables us to consider both the interactions 
between diverging duplicates as well as allelic interactions [20j, particularly 
those between allelic variants or heterozygotes. Heterozygous advantage has 
been identified as a property that facilitates the fixation of a duplicate gene 
in a population |2H [T71 [5] , a finding of relevance to our results below. Het- 
erozygosity is also commonly associated with hybrid vigour; indeed, this 
correlation has also been extended to dosage (im-) balance of copy number 
variations and alterations to the amplitude of circadian rhythms [22] . Unlike 
circadian clock models [23] , which contain a negative feedback link [23] in cir- 
cuit topology, our duplicated activator network does not introduce negative 
feedback explicitly. Instead, negative feedback arises due to a competitive 
mechanism whereby one activator excludes the binding of another to the 
promoter. Although the mechanism implementing negative feedback - and 
thus oscillations - is different, our model displays a dependence of amplitude 
on hetrozygosity and dosage balance found to be correlated to hybrid vigour 
[22] . Furthermore, the presence of paralogous genes in oscillatory processes 
has been noted in the literature [25t [2B], also as a means to maintain os- 
cillations upon loss of single genes |27| [28] . Although our model makes no 
explicit reference to the systems that contain the genes reported there, our 
theoretical finding suggests an evolutionary mechanism for the proliferation 
of duplicate genes that take part in oscillatory dynamics as heterozygotes. 

There are two stages of modelling that we perform in this paper. The 
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key first step is to frame the onset of qualitative changes to the dynamics of 
transcription regulation as a consequence of gene duplication in the language 
of dynamical systems. The second step is to model the likely evolutionary 
fate of a mutant haplotype containing such a duplicate. For the first step, 
we set up a model of transcriptional regulation of a genotype characterised 
by a duplicated positive feedback loop, with the transcription rates derived 
from the probability of promoter occupancy by RNA polymerase [29] in the 
presence of the activators. An extension of the model to include the effects 
of intrinsic noise - stochasticity in transcription - is presented, to address 
how single cell and population averaged phenotype might differ. To enable 
the second stage of analysis on the evolutionary fates, a population genetic 
model for the likely invasion of a duplicated gene is then introduced, with 
selection coefficients that depend on parameters of the transcriptional net- 
work. Thereafter, we present results on the behaviour of these models. We 
identify qualitative shifts in the dynamics owing to alterations in parameter 
values - in cis and in trans - to be presented separately. After a discussion of 
the results owing to cis-regulatory changes that affect switching behaviour, 
we address the case of changes that affect regulation by irons-acting effects. 
It is this set of changes that we shall track in the second stage of mod- 
elling, that of the evolutionary fate of the duplicated gene. A final section 
summarises the different components discussed within the regulatory model, 
linking the qualitative aspects of model behaviour to different experimental 
studies. 

2. Model of the duplicated autoregulatory gene sw^itch 

A positive feedback loop provides a mechanism to convert transient input 
signals into stable output levels, acting as a switch. Developmental stages, 
characterised by stable expression levels of subsets of genes, rely on regu- 
latory circuits that implement positive feedback switches [9l [11] . The 
smallest circuit implementing positive feedback is one with a single autoreg- 
ulatory gene (see Figure [T]); for it to act as a switch, it is necessary for the 
activation reaction of a transcription factor binding to the gene promoter 
to have a cooperativity index, or Hill coefficient, of 2 or more [30]. As a 
specific instantiation of such cooperativity, we require that the autoregula- 
tory gene at the top of the hierarchy activates itself after dimerisation of 
its protein product (as shown in Figure [T]), giving rise to a Hill coefficient 
of 2, although nothing in the model requires such a specific reaction. In 
particular, we disregard consideration of heterodimeric or homodimeric as- 
sociation, and assume the existence of homodimers alone. As will be clear 



4 



from the analysis below, a greater degree of cooperativity facilitates many 
more qualitative changes to the dynamics, which we shall disregard in the 
interest of parsimony. Our model takes such an autoregulatory gene cou- 
pled to target genes, which makes up a topology of the "terminal selector" 
network type jlTj, and duplicates it (see Figure [2]). When duplicating the 
gene, we shall assume that both coding and regulatory regions of the gene 
are duplicated; thus, we end up with the network on the right hand side 
of Figure [2j Other influences on the activator might define developmental 
context or tissue specificity. Such interactions operationally outline the loci 
of context dependent changes that we shall introduce, and are indicated by 
the arrows labelled h into the activators ai^2 in Figure [2| 



cis-regulatory protein 
sequences coding 




dimerization 



translation 



Figure 1: The haplotype on the left supports a representation of the schematic kinetics 
of transcription via dimeric activators which, with the adhesive reactions faciUtated by 
the "helper" proteins, bind to DNA and recruiting the RNA polymerases. The mRNA 
transcribed is then translated, and homodimers are formed before the autoregulatory 
reactions proceed. There are further reactions that involve decay of mRNA and proteins 
that are considered in the models below. 



2.1. Thermodynamic model of gene activation 

The "thermodynamic" approach to modelling transcription is to set the 
rate of transcript formation to be proportional to the occupancy of the 
promoter of the gene to be expressed from a single allelic locus [29]. Pro- 
moter occupancy is facilitated by transcription factors that bind to cognate 
DNA sequences and recruit the transcription machinery - RNA Pol II, me- 
diator complex forming multi-component proteins, etc. The probability of 
occupancy is accounted for by assigning Boltzmann factors for the possi- 
ble configurations of protein-DNA bound states [29t |3T] . The binding and 
unbinding protein-DNA reactions of are assumed to be in rapid detailed 
balance to justify the use of the thermodynamic formalism; hence the ratios 
of these reaction probabilities are given by the negative exponential of the 
difference of free energies of the bound/unbound configurations. 

The contribution of these configurations to the promoter occupancy is 
modelled in detail in Appendix A. Here we provide the resulting expression 
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Figure 2: Haplotype and network view of the duplicated gene. The autoregulatory 
gene is part of tlie ancestral genotype and displayed in its network context on the left, 
with the allelic variants on top. The duplicated haplotype leads to an increased number 
of regulatory interactions in the network on the right. 



for the probability of occupancy of the promoter of a gene i. This can 
be introduced in terms of the function of the transcription factors ai, 
(represented in vector notation cx = (0:1,02)) 

+ UxTixax + ti2ri2a2 
1 + tiiai + ti2a2 (1) 



which measures the amount of transcript produced in the presence of the 
ai relative to the basal rate of transcription r^o, when transcription fac- 
tors are absent, a quantity called fold-change. The parameters rii,n2 in 
= (?'iO; '^ii) '''12) stand for the strength of recruitment of the transcription 
machinery (Pol II, Mediator, etc) [32] by ai (rate rn) and 02 (rate rj2) from 
the chromosomal locus indexed by i. The parameters tij in tj = {tii,ti2) 
stand for cis-regulatory binding strengths of protein (aj) binding to DNA 
locus {i), determined by a more detailed description in eq. ^ below. The 
probability of promoter occupancy is given by 



$i(Q,ri,tj 



1 



(i = l,2) 



(2) 



The derivation of these expressions for <I> » and ^ j follows |33t [31] , and is 
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presented in Appendix A. There we also derive the expression|3]below, which 
includes a "helper" protein that serves as a proxy for contextual influences. 
While the Aj indicates the proteins of interest (the duplicated activators) the 
"helper" proteins h are introduced to indicate the presence of co-regulators 
that enable tissue or developmental context-specific expression. In the fol- 
lowing expressions ([3|, various es denote binding energies: denotes the 
non-specific binding of protein A to any of Nns DNA binding sites; ^Akd^ 
stands for the binding of Aj^ to its cognate site d^; eab measures the energy 
of protein-protein interactions between A and B; ShAjp the glue- like inter- 
action between helper h, activator Aj and the RNA polymerase/Mediator 
complex p: 



A. 

Nns 



nj = exp(-/3eA,p) 



/ h \ 

1 + exp(-/3(e^ - ela + ShA.p - eA,p)) 

1 + ^exp(-/3(e^ -e^J) 



"ns 

tij = exp(-/3(e^^.^^ - e%^^^)) ( 1 + ^ exp(-/3(e^ - el^)) 



"ns 



(3) 

The details of the derivation is provided in Appendix A. 2 

Eq. ([3]) parameterises are the molecular interactions that determine tran- 
scription rates in the gene regulatory network. While most of the paper will 
treat the rij, tij (and other rates to be introduced shortly) as the parameters 
that determine network behaviour, the explicit definitions ([s]) reveal the sub- 
strates upon which mutations may act in order to change the protein-DNA 
and protein-protein interactions that determine phenotypic outcomes and 
evolutionary fates. In particular, in order to to present verbal arguments 
that rely on mutations that disrupt pleiotropic properties and restrict them 
to context-specific roles (as in the subfunctionalization model of see also 
|34] ) the helper protein index h will serve as a placeholder for such context- 
specific factors. This variable will be inherited by both copies of the gene 
upon duplication, and contribute to the set of mutations in cis (labelled by 
i) by altering binding energy terms carrying both i and h indices. Having 
multiple helpers allow cis-context dependent effects by altering their bind- 
ing affinities to DNA, and in particular in complementary ways - one helper 
(and not the other) binds to locus 1 and not to locus 2 and vice versa. We 
shall also study the effects of mutations that affect sites recruiting the tran- 
scriptional machinery and bring about changes in trans-acting terms ea p 



and/or EhA p, with the case of multiple helper proteins detailed in Appendix 



A. 4 Measuring binding energies relative to /3 = ksT ~ 0.6 kcal/mol, one 
can estimate the scale of the changes to parameters {tij, rij) — )• {t'-^ r[-). In 
particular, a change in binding free energy of AAG = 1.5 and 2.5 kcal/mol, 
typical of biophysical measurements for mutant effects, translates to ratios 
of tij/t[j or rij/r'^j of ~ 12 or 65 respectively. These are the parameter ranges 
for which qualitative changes to transcriptional dynamics are noted below. 
The actual parameters used to perform the computational experiments as 
in 



Appendix B 



2.2. Model of haploid transcriptional deterministic kinetics 

In the thermodynamic model for the rate of transcript production, we 
make the assumption that the transcriptional activators Ai act as homod- 
imers, as shown in the representation of the autoregulatory circuit in Figure 
[T] that we shall take to be the ancestral haplotype. Dimer formation by 
monomer binding and unbinding is assumed to rapidly be in detailed balance 
on the time scale of gene expression, so that the concentrations of dimers 
[Ai] = xf/K(iim,i where Xi is the concentration of monomers and Kdim.i is 
the dissociation constant for dimerisation. Monomers are translated from 
transcripts which decay at rates 5i much greater than the decay rates A,, of 
the corresponding proteins. This assumption of faster time-scale of mRNA 
dynamics leads to the consequence that mRNA levels are slaved to protein 
dynamics. The equations for the rates of changes to proteins xi and X2 thus 
captures the essential dynamics of the system. Here we present the model 



for the haploid case, with the detailed derivation provided in Appendix A. 3 

Xi — C-iipi (x, Tj , t j) ^iXi , {i — li2), 

y = f(a;i,a;2,y) 

where f(x,y) is the dynamical sub-system for the set of downstream vari- 
ables y that the activator gene x influences, (/?i(x, rj, tj) = <^j(x'^, r^, tj) in 
eq. Q, with A, the linear degradation rates for the proteins. The vari- 
ables Xi have been scaled in terms of protein-DNA and dimer dissociation 



constants (see Appendix A.3), and the parameters Cj, 



Ci — - l,o; 

Oi 

are defined in terms of the rates for translation (vTj) and mRNA degrada- 
tion [5i). The transcription rate jii expresses the proportionality between 
promoter occupancy in the thermodynamic description at each chromoso- 
mal locus, and captures the number of such loci, which will play an 
important role in our later discussion on haploid and diploid cases. 
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Table 1: Reaction scheme for stochastic expression of proteins Ai, i = 1,2. There are 
3 ways in which they are produced, by basal transcription, by regulated recruitment of 
transcriptional machinery by Ai or by A2 followed by translation. In the above, Fi = 
(1 + no) + (1 + + (1 + ri2)ti2{-^)'^ , is the volume factor and Ki is the 

geometric mean of the dissociation constants for dimerisation (K^^^) of protein Ai and 
for protein-DNA binding = ifj^^ expiPie^.^^ - £%a))- 



reactions 


rate of reaction 


protein production: 

, basal J 

(pi — > Ai 
(pi -A- Ai 


rio 

c- ( \ 




protein degradation: 
Ai^% 





2.3. Stochastic kinetic model 

The development of novel experimental techniques for tracking gene ex- 
pression in single cells has made opened up for observation the consequences 
of the stochastic nature of the dynamics of gene regulation on phenotypes 
and their evolution 133 EZ]- In this section we present a simplified model 
of gene expression in this doubled autoregulatory gene network. This will 
enable us to explore the consequences of intrinsic regulatory noise on the 
phenotype and what its implications might be for evolutionary fates of the 
duplicate genes. While the detailed model of stochastic kinetics is provided 
in Appendix A. 3, here we present a simpler model that incorporates many 
of the reaction steps into a Hill-type gene regulatory function, just as in the 
deterministic version in eq. Q . The reactions are summarised in Table [T] 

2.4^. Diploid model of regulation 

In diploid organisms, each gene comes in two copies independent of any 
duplication event. To examine how duplication generates novelty, or to track 
how a genotype containing a mutant duplicate can be subject to evolutionary 
modelling, we need to construct a model of transcriptional regulation that 
can track allelic variants. Further, we restrict differences between alleles to 
be solely in their activation sites, with different alleles (both before and after 
duplication) affecting transcription rates only by differing affinities for the 
transcriptional machinery - SA-^p 7^ £A2p and EhA^p / £hA2p- Consequently 
the coefficients in rj = (rjo, Tii, ri2) are independent of i. To restrict our 
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attention to activation site changes, based on results obtained from model 
analysis below, we shall assume that there are no differences between the 
alleles in the free energies of protein-DNA binding, hence we take tj = 
tj = 1, where we have rescaled the concentration variable to be multiples 
of the protein-DNA and dimer dissociation constants that make up tj (see 



Appendix A.3). While the dynamics of transcriptional regulation will be 
studied in both i-dependent and z-independent cases below (where i is the 
genomic locus), the evolutionary dynamics of duplicates will be explored 
only in the context of mutational variations affecting activating sites in a 
modular fashion, independent of cis-regulatory context. 

2.4-1- Singleton case 

(i) 

If there are two allelic forms x- where j = 1,2 for each gene Xi we 
can extend the development leading to Q to consider how each allele gen- 
erates a transcript at the rate determined by its promoter occupancy by 
transcription factors. We can now set up a model to investigate variability 
in transcription dynamics owing to allelic variations, or heterozygosity for 
the autoregulatory circuit prior to duplication. We introduce an allelic index 
j = 1, 2 in the fold-change for each gene labelled by i [T^. For the case 
of a single locus i = 1 with the corresponding fold change '^^\x^i \ x^^ ) . 
If the two alleles the same, the homozygous case, the diploid 

genotype network is shown in Figure [s] (A) , but the model dynamical model 
is equivalent to the ancestral haplotype in Figure [2]^a), or the switch model 
of a single autoregulatory gene, but with the parameter representing protein- 
DNA binding doubled: t i— )• 2t (eq. ([T]). For the heterozygous case, allelic 
variants x^\x^^'^ may be regarded as the two distinct activators Ai and A2 



in the haplotype depicted in Figure . The dynamical equations are then 
the ; 



the same as in the haploid case, except that x^^'^ 1— xi and x^^'' 1— t- X2 in eq. 



2-4-2- Duplicated allele 

There are two cases to consider when one of the alleles is duplicated - 
if the ancestral haplotype is homozygous or heterozygous. If homozygous, 
the duplicated genotype is again of the original haplotype topology as in 
Figure [2]^a). There is thus only one species of transcription factor protein 
which regulates itself and its downstream components y, with the effective 
dissociation constant to its DNA binding site scaled by a factor of 4 relative 
to the single copy haploid model. This will only shift the the threshold for 
switching in the analysis presented below. 
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If the ancestral genotype is heterozygous, containing two different aheles 
fli and a2 at a locus labelled by duplication creates another chromosomal 
locus, B, in a mutant haplotype containing 2 copies of one genetic sequence, 
say ai6i. The corresponding genotypes can in a mixed mating population, 
considered in greater detail below, are of the form a, / aj (4 combinations for 

= 1,2) aibj/a^ (8 combinations) and Uibj/akbi (16 combinations). All 
these combinations require only 2 different species of transcription factors. 
If the allelic variant is in a coding region, the differences show up in the 
rates of recruitment of the transcription machinery, £Ajp, whereas if the 
mutations are in the regulatory region, the differences to be considered are 
in the binding of helper/coactivator proteins, that change EhAjp in gQ- 
(The inability to form heterodimers between distinguished alleles is still 
assumed.) Upon taking these changes into account, the difference in the 
number of distinct gene copies to be considered is contained in the parameter 
[(j)"^^] that provides allelic specificity to Cj in eq. 

In sum, the corresponding model is described by the ODEs 



where (again) the superscript {k) refers to the two alleles and their corre- 
sponding parameters refer to possibly different binding interactions at each 
allele as introduced. The parameters Cj in (eq. ([s]) can be extended to reflect 
potential allelic differences in transcription, translation or degradation rates 
for proteins and transcripts, but which we keep the same for both copies of 
the duplicated allele, since our interest lies in the mutation that leads to an 
allele with two copies. Thus, to present the results in the latter sections, it 
is the model in eq. (|4]) that will be studied with the parameter q playing a 
role in the evolutionary discussion. 

2.5. Adaptive dynamics of duplicated gene 

Most of what we have modelled of the autoregulatory switch has been 
independent of the mode of its inheritance, to which we have appealed to 
in order to set up the terms of discussion, but which we have not presented 
in detail to pursue its possible evolutionary implications. In particular, our 
discussion has implied a continuity of argument from haplotype to inher- 
itance and evolutionary fate. As noted, the key difference introduced by 
duplication at the level of haplotype is the multiple feedback structure, an 
interaction topology that is novel for haploids but not diploids. For the 
diploid case, the locus of change for system dynamics upon duplication is 




(6) 



E,cfVf^(x,rW,tW)-A,x. 
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Figure 3: Recombination amongst haplotypes and topologies for diploid reg- 
ulatory networks. Mating between two diploid genotype networks, one containing a 
chromosome with a duplicated segment of DNA, is shown with an X. The diploid network 
with duplicate genes on both chromosomes is not shown as that occur with a very small 
probability immediately after the duplication event. A duplicate segment of DNA gets 
shufHed by the process of recombination, which is assumed to occur at a rate p. Of the 
mating genotypes, the alleles chosen in the gametes are the dotted and dashed pairs, of 
which the dashed pair is tracked in the figure as it undergoes recombination events. In 
the model presented below, singleton alleles are at their equilibrium frequencies in the 
population. 



tracked by the parameter which incorporates the copy number [(/>j]. The 
number of functional copies of the gene used for Mendelian arguments and 
for matching their effects to the environments inhabited by the phenotype 
ranges from to 4 in the diploid case |7]. Recombination in sexually re- 
producing diploid organisms "randomises" novel mutations along genomes 
and may offer greater possibilities for fixation by selective forces [38j . In the 
previous section we have associated the analysis of divergent properties of 
post-fixation duplicates to pre-duplication heterozygotes [211 E] by mapping 
them onto a common dynamical system for transcriptional dynamics. In 
this section, we investigate the evolutionary consequences of network level 
effects of homozygous and heterozygous genotypes using a model for the 
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evolution of duplication based on [jL7J. In this model, a mutant duplicated 
allele is introduced into, and examined for its ability to invade, a population 
characterised by an existing equilibrium gene pool. Equilibrium configura- 
tions provide standard population genetic backgrounds in which a mutant's 
ability to survive is analysed. The transcriptional network of interest in this 
paper makes it natural to consider the effects of distinct alleles in the equi- 
librium population; hence, we examine the case of equilibrium maintained 
by over dominance, where heterozygotes are fitter than either homozygote in 
this bi-allelic setting. We also do not wish to consider the effects of muta- 
tion after the duplication event in the evolutionary picture (as it is already 
a lengthy paper); thus, we eliminate consideration of equilibria involving 
mutation-selection and mutation-drift balance |39j. 

The model includes haplotypes defined via a pair of loci A, B, with allelic 
values of oi, 02 for A and 60, 61, &2 for B. bo is the null allele, so the haplo- 
types aibo and 0260 are the alleles ai, 02 prior to duplication. (Note, that we 
have introduced the null allele at the second locus for convenience. If we al- 
lowed for a null allele, oq at the first locus, the pre-duplication alleles would, 
equivalently, be expressed as ao^i and 00^2.) In the overdominant case, the 
model assumes an initial polymorphic equilibrium configuration of alleles 
ai and 02, via heterozygote advantage [21j. These singleton haplotypes are 
also called aibo and 0260 below. Homozygotes ai/ai and 02/02 (also named 
aibo/aibQ and 0260/02^0) have fitness coefficients 1 — s and 1 — t relative 
to the heterozygote 01/02 with a fitness coefficient of 1; consequently, the 
relative frequencies of alleles oi and 02 at equilibrium are xio = t/{t + s) 
and X20 = s/{t + s), respectively [39]. The fitness of the population at this 
polymorphic equilibrium is W = 1 — ts/ {s + t). 

A duplicate mutant is introduced into a background of this existing equi- 
librium condition. The duplicate haplotypes are denoted 0161, 0261, 0162,0262 
with frequencies xn, X21, a;i2, X22 respectively. Following [1^, we assume a 
two step decomposition for updating the haplotype frequencies: 

Xij I—)- x*j by recombination, and we shall ignore x*j 1— )• x[j by mutations. 

We set up the discrete dynamics of frequency updates due to recombina- 
tion by first assuming that the mutant duplicate has a probability of being 
chosen for mating with a probability ~ N^"^ where A'^e is the effective pop- 
ulation size, much smaller than those of the singletons xio = t/{t + s) and 
^20 = s/{t + s) which are at their equilibrium frequencies. Therefore the 
probability of pairing duplicate gametes with each other is negligible com- 
pared with those of pairing a duplicate haplotype with that of a singleton 
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|17j . After duplicating oi to produce aibi, recombination, mating individu- 
als containing gametes of haplotype ai6i with those containing 02^0 occurs 
with probability proportional to X2o2;ii, creating a ai6o/a2&i genotype at a 
rate proportional to recombination probability p and 0260/^^1^1 with proba- 
bility {1 — p). The fitness values of the two genotypes are indexed by their 
subscripts - Wio;2i and W20;ii respectively. To first order, the equilibrium 
frequencies xio and X20 are unchanged. 

The discrete map is represented via the block-diagonal matrix: 



/ ^Ii \ 



^11 

■^21 



X 



22 



1 

w 



R-11X21 







^22x12 



V 4i / 



/ ^11 \ 

X21 

X22 

\ ^12 / 



where the (2x2 submatrices are defined in ([ 

/O)W^20;ll£20 



R 



11x21 



1^10;ll£lO + (1 
pH^20;lli20 



below. 



W20;21^20 + (1 



(7) 



pW^10;2lAlO 
/0)W^10;2lAlO 



R 



22x21 



/O)W^10;22£l0 /CVF20;12^20 
W'l0;12^10 + (1 - /0)W^20;12A20 

(8) 

The fitness allocation for aibo/aibQ is taken to be the same as that of the 



W^20;22£20 + (1 
pW^10;22ilO 



1 — s and W' 



20;20 



aibi/aibo genotype, for i = 1,2, so Tyio;io = Wio-n 
W^20;22 = 1 — t. This assumes that, since identical genes can only generate 
switch-like behaviours, the outcome of increasing dosage is a shift in the 
threshold for switch activation only, which we assume to be neutral (see 
below). For the case where the copy numbers for the allelic variants of the 
single genes are in the ratio 2 : 1 or 1 : 2, we take W"io;i2 = l^i0;2i = 
M^20;ii = 1 + d and Wiq-22 = VF20;i2 = W20;2i = 1 — u. We shall make a 
note of the case d = since we would like to impose fewer constraints for 
positive selection explicitly. All of these fitness coefficients are normalised 
with respect to the pre-duplication heterozygote, VFio;20 = 1- The fitness 
coefficients are summarised in Table [2j 

Following standard practice, we have summarised the effects of viabil- 
ity and reproductive success by single scalar-valued parameters. We will 
need to relate these parameters to dynamical states in the model of the 
autoregulatory gene activator in order to make claims about evolutionary 
consequences of regulatory changes. Thus, s, t, d, u will be defined in terms 



of the parameters in the reaction system shown in A. 3 in the Appendix, and 



which for the purpose of dynamical analysis we have summarised in terms 
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Table 2: The fitness values Wij-ki for the different genotypes obtained by mating ij with kl 
gametes. The selection coefficients are with reference to the singleton heterozygote which 
is assigned a fitness of 1. All coefRcients s,t,u are positive, indicating reduced fitness 
compared to the a\/a2 heterozygous singleton, while both signs of d are briefly explored 
in the main text. 





10 20 11 21 22 12 


10 
20 


1-s 1 1-s 1+d l-u l+d 
1 l-t l + d l-u l-t l-u 



of the parameter combinations that arise in Q. Hence the selection coef- 
ficients of genotypes a^hi capture dependence on the network parameters 
s = s{ri,ti,Ci, Ai) in a specific environment via their dynamical behaviour. 
Since we do not model any particular environment in this paper, we will 
require verbal arguments indicating the plausibility of adaptive roles of the 
dynamical states in parameter space. The passage from a continuous set of 
biochemical parameters to a discrete set of selection coefficients will be moti- 
vated via the appearance of distinguished qualitative regions in phase space 
(see Figure 11 below) that partition the phenotype space into a discrete set 
of qualitatively different dynamical behaviour. 



3. The two component subsystem feeds novelty downstream 

In this section we will argue that the autoregulatory component of the 
network in Figure [2] can be analysed in isolation of its downstream effects for 
the kinds of effects that we will focus on - the effects of dosage (im)balance 
introduced by gene duplication. This will be obtained by showing how the 
eigenvalues of the linearised dynamics can be factorized. We then show an 
immediate consequence of competition between the two copies of a gene for 
regulatory binding and activation. 

3.1. Restriction to the two-component subsystem 

The Jacobian of the dynamical system in eq. Q has the same structure 
as the adjacency matrix in Figure [2]). Since the Jacobian determines the 
local dynamical behaviour around any state of the system, changes to its 
eigenvalues signal the onset of qualitative behavioural patterns. In particu- 
lar, around each fixed point of the dynamical systems of the gene regulatory 
networks shown in Figure [2| the Jacobian of the network gets updated as 
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In eq. (9), the (Qnxn) = ^ submatrix of the Jacobian corresponds to the 
part of the network enclosed in a dotted box in Figure [2j (^ji (i = 1, . . . , A^) 
are the partial derivatives of the promoter occupancy functions of the down- 
stream genes with respect to the variable representing the autoregulatory 
gene, an denotes the partial derivative of the net rate of expression of 
the autoregulatory gene with respect to its own expression state. After du- 
plication (or in the biallelic case), there are two genes that influence the 
downstream sub-network, and auto- and cross-dependencies in the regula- 
tory dynamics of the activators with Jacobian elements Ojj are introduced. 
The block structure of the Jacobian matrices implies that the determinants 
are of the form 

an ai2 

a21 022 



On X 1^1 and 



(10) 



and thus the eigenvalues that determine dynamical consequences factor into 
two pieces. This modular decomposition suggests the following strategy. 

The argument invoking the selective advantage of increased gene dosage 
[7] looks to the effect of the doubling of steady state levels x* — )• {x^ + X2) 
on the target gene dynamics f(x*,y) (where we use * to indicate steady- 
state levels in the regulatory network in a feed-forward manner. These 
changes f(xj + X2,y) — f(x*,y) may indeed be associated with a positive 
selection coefficient. However, we shall instead look at other sources of 
qualitative shifts - not gene dosage, but gene dosage balance instead. We 
shall assume that the autoregulatory positive feedback serves to set a binary 
decision switch for the downstream components to be triggered, and the net 
effect of this doubling 'merely' adjusts the threshold of the switch. Since we 
look for signals of dynamical shifts via the local analysis provided by the 
eigenvalues of the Jacobian, and the Jacobian in eq. ([9]) is of a factorised 
form, the evolutionary significance of the changes ascribed to downstream 
sensitivity to doubled gene dosage lies in in eq. (10). In this paper. 



we are interested in the qualitative changes that might influence phenotypes 
of duplicated genes lies in the other factor involving the Ojj components. 
In considering dynamical changes, such as the onset of oscillations to be 
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considered below, this explains why we focus on the two-gene sub-network 
involving the duplicated autoregulator. For the case of the switch, we shall 
use the downstream levels as a reporter of the dynamical switches in the 
duplicated gene motif. 

The set of reductions required to get to this simplified form Q assumes 
separation of time-scales of transcript and protein formation and degrada- 
tion relative to protein-DNA and protein-protein interactions. This can 
introduce differences in the time scales of the results, particularly when 
dimerisation is involved |40j - the full system has slower dynamics which 
shows up in its time-dependent behaviour. Similar consideration needs to 
be paid to the time scales of the interactions that couple the double-activator 
motif to the downstream components. It is known that coupling to down- 
stream components can alter qualitative dynamics of a network, a principle 
dubbed "retroactivity" |41j . We have checked that the dynamics of the cou- 
pling can indeed affect the behaviour of the system, as does the kinetics of 
dimerisation. However, we have checked that it is possible to find parame- 
ter ranges for downstream coupling and dimerisation kinetics for which the 
behaviour of the full system behaves in a manner qualitatively similar to 
the simplified model we choose to focus on, albeit with a slower time-scale 
for the dynamics. However, for the purposes of this paper, these differences 
are not significant; all the essential qualitative features predicted from this 
simple model are also present in the full kinetic description. 

For the analysis below, we shall focus on two principal cases for the 
combinatorial regulatory parameters tj,rj. In one, we set ri = r2 = r, cor- 
responding to the case when recruitment for activation is modular, i.e. in- 
dependent of cis- context i. Thus, (rio,rii,ri2) = (r2o, ?'2i, ''22) = {rQ,ri,r2) 
and v^i = ^2 = 'f- When czs-regulatory context matters for expression, for 
instance, in the scenario described in the subfunctionalization model, we 
consider ri = (ro,ri,r2) and T2 = (ro,r2,ri). If ri > r2, this choice indi- 
cates a greater rate of transcription of gene 1 from duplicate locus 1 and 
a greater transcription rate for gene 2 from locus 2. This can be achieved 
by assigning different affinities for the helper proteins h in the two contexts 
labelled by i in equation (|3]). A detailed derivation of how this emerges is 
provided in [Appendix A. 4 and Figure |4] makes the modelling assumptions 
explicit. 

3.2. Dual regulation as a consequence of changes to activation domain 

The fold-change function ([2]) determines whether the influence of a 
protein on gene expression is that of an activator or a repressor, indicated by 
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Figure 4: Binding strengths and cis-regulatory mutation leads to the heterozy- 
gous switch. The two proteins expressed by the two alleles/duplicate genes are shown in 
vertical and horizontal stripes bound to their DNA binding site indicated by the target of 
the regulatory arrow. They have complementary binding strengths to the helper proteins 
Ci,2 appropriately shaded and further indicated by the arcs between the proteins - the 
dark arcs denote greater strength of association compares to the dashed ones. The sites 
marked X are where complementary loss of binding occurs upon mutation, as is postulated 
in the subfunctionalization model [14j . 



the sign of its derivative with respect to the amount of transcription factor. 
The derivatives of with respect to ai^2 is given by 



9/3ai \ ^ , , . tiiti>2{r(,i-ri2) 
'^£(Q,t£,r^) - 



/ ra - rio 
I — r + a2 

re2 - reo 

ai 



d/da2 J ' " (l + tn«i+i^2«2)2 

V ta{rii - ri2) 

(11) 

flagging the possibihty of non-monotonicity of '^i. This non-monotonicity 
means that increasing the amount of a transcription factor produces a fold 
change of transcription that increases (activates) in one context and de- 
creases (represses) in another, a feature called dual regulation. In this model, 
the context is set by the concentrations of the paralogous protein. We con- 
sider the case r^i > r£2, where the Ai binds more strongly to the Pol II 
enzyme than A2 (ignoring enhancer context I for /i = in ([3])) and find that 
A2 behaves as an activator at low levels of Ai and a repressor when the Ai 
level ai crosses a threshold |42t I43j . This repressor-like behaviour of A2 (the 
activator with weaker affinity to Pol II) occurs even when A2 is a facilitator 
of transcription by itself. Since access to the binding site on the DNA is a 
limiting resource, with both activators competing for it, increasing the levels 
of the weaker activator hampers the overall efficiency of transcription from 
the combined (^1, ^2) system. Unequal binding affinity to DNA target sites 
{tiijt£2 7^ 1) changes the amount of regulator Ai that must be present for the 
crossover behaviour to occur. Such dual regulation can also be a property 
of activators expressed from a single polymorphic locus. 

This feature, that duplication of an autoregulatory gene can introduce 
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competition for regulatory sites on the DNA and lead to dual regulation, 
opens up the possibility that novelty can arise due to mutations to modular 
components of one of the duplicated proteins which alters rjj = exp(— /3e^^.p) = 
rj or exp{—(3ehAjp) or even the DNA binding energy of the helper protein as 
seen m eq. (|3j). In all of what we discuss in this paper, the key parameters 
that we change are the rij in eq. ([3| - the arrow connecting transcrip- 
tion factor to the "helper" in Figure]!] - while we keep tij to be the same 
across the two genes of interest, just for simplicity. The r^j do depend on 
protein-DNA binding strengths, but only for the helper proteins, as shown 
in Appendix A. 4 



4. Changes to dynamics of the network — switches and backup 

The autoregulatory circuit behaves like a switch because the dynamical 
system Q supports 2 stable fixed points for a given choice of parameters. 
Initial conditions specifying protein levels on either side of a threshold value 
drives the system to its low or high expression state. In this section we 
consider the cases where having two genes coupled via feedback gives rise 
to different types of switches, referred to in |44j as a progression switch 
(where two genes can move from low to high expression states) or a decision 
switch (where the genes switch to low-high or high- low expression states, 
the simplest example of a choice of regulatory fate) . 

4-1. Homozygous or progressive switch: lowering the on- off threshold 

Duplication of a gene has typically been associated with increase of pro- 
tein product. If the gene were part of a switching circuit with a phenotype 
that is binary- valued, as is the case for networks that convert transient, 
threshold-crossing inputs into stable outputs, it might well be the case that 
doubling of a gene does not greatly increase protein product, but merely 
makes the threshold more accessible for crossing. We illustrate this possi- 
bility by setting rij = r and Aj = A = 1 and Cj = c = 1 in Q. This 
is equivalent to a single gene switch but with tij doubled. As expected, 
altering the dissociation constant for an autoregulatory circuit changes the 
threshold in a sigmoidal function. The steady states are solutions to a cubic 
equation with two stable fixed points separated by a saddle. The bifurcation 
from a mono-stable to a bistable state is of a saddle-node variety [45j . The 
nuUclines, bifiurcation plots and histograms of expressed proteins generated 
from a Gillespie simulation make the point vividly in Figure [5j This sub- 
section is presented only to make the comparison with the asymmetric case 
obvious. 
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(a) (b) (c) 

Figure 5: The location of a transition from a low expression state to a high 
expression state is lowered upon duplication with no further divergence. On 

the left are histograms of the expression levels of the duplicated self-activating gene, with 
the bifurcation from one to two to one stable steady state for changing r^j = r from 1 



through 10 to 20 (from top to bottom). All other parameter values are as in Appendix 

[b] with c = A = 1, and the activation strengths of the (identical) proteins taken as 
r X ro-The location of the modes (specifically shown for the histogram in the middle) 
correspond to the intersections of the (dotted, dot-dashed) nuUclines indicated by the big 
black circles on the right. The vertical lines in (b) correspond to the fixed points for the 
single autoregulatory loop, with the smaller dots indicating steady state levels there. The 
open circle on the vertical line in the middle and the intersection indicated by the arrow 
are the locations of the thresholds in the single and duplicate gene cases respectively. In 

(c) the abscissa denotes increasing values of r while the ordinate is the expression level. 
The characteristic S-shape of a saddle node bifurcation with hysteresis is seen. The X1-X2 
planes in this figure are in logarithmic scale. 



4-2. Heterozygous or decision switch via complementary loss of recruitment 

In this case, we note that the two duphcate proteins recruit Pol II with 
the help of complementary helper proteins at the two loci. Gene copy 1 
is activated by protein 1 at a rate that is greater than that achieved by 
duplicate protein 2, rn = r x ri2 with r > 1. The roles are reversed for 
expression from gene copy 2, with r22 = r' x r2i with r' > 1. The detailed 
origin of these parameters lies in the loss of complementary regulatory bind- 
ing sites of two helper proteins that have complementary binding preference 



to proteins of the two alleles/genes, as explained in Appendix A. 4 They 



are summarised in the Figure [4j The combined strength of recruitment of 
the transcriptional machinery is partitioned (as shown in Figure |4| and in 
detail in Appendix A. 4) via loss of binding sites for the helper proteins. 
For simplicity, we shall take r2i = ri2 and r = r'\ the symmetry does not 
introduce any non-generic features to this dynamics, but makes the analysis 
more transparent. 
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The fixed points are now given by intersections of cubics obtained by 
setting {d/dt)xi^2 = in eq. Q, where we ignore the downstream sector. 
We find multi-stable solutions in this case - a tri-stable state with (xi,X2) 
levels being (low, low), (low, high) and (high, low) which undergoes a change 
via a pitchfork bifurcation upon increasing r to a bi-stable state where the 
low expression state is lost and only the mutually exclusive expression states 
(low, high) and (high, low) for (x^,X2) remain. Figure [g] illustrates this. 

We have thus far studied the effects of complementary mutations fol- 
lowing duplication that affect r^j by affecting protein-protein interactions at 
the enhancers and thus etAjp in eq. ([s]) , where helper proteins indexed by h 
provide transcriptional response specificity in this modelling framework, h 
is a place-holder for the cis-regulatory context of gene expression. In order 
to implement the complementary loss-of- function mutations of @], we have 
the helper protein enhance the recruitment of transcriptional machinery in 
complementary ways. Starting from = {rio,rii,ri2) with equal rn and 
ri2, we end up with a situation where transcription from locus 1 is greater 
for the coding region 1 (say) than the coding sequence 2 (rn > ri2) and 
similarly, r22 > f'21 (details in Appendix A.4). We compare the two cases 
of progressive (homozygous) and decision (heterozygous) switches [S] in 
Figure [7| 

We use the expression level of a downstream gene as an indicator vari- 
able (abscissa in Figure [?]) to locate the influence of the control parameter 
r, the ratio of activation strengths of the two transcription factors (ordi- 
nate in Figure [?]) . The filled circles correspond to stable fixed points of the 
expression of a downstream gene activated by the pair of duplicated acti- 
vators and the open circles are those that correspond to the unstable fixed 
points identified in Figures [5| [6j In Figure [7] (c) we show that the key differ- 
ence in the two cases (now the open circles correspond to the closed circles, 
or stable points in (b), and the filled circles are the stable steady states 
in the duplicated- no-divergence case of (a)) lies in the extent of hysteresis 
that the system affords. The doubled autoregulator shifts the threshold for 
entry into the high-expression state lower compared to the pre-duplicated 
gene. The asymmetrically expressed decision switch case is compared to the 
pre-duplicated situation in Figure [7] (d) , with barely any difference in the 
expression levels in the two cases. 

The doubling of the self-activating gene comes with an obvious corollary 
in that the system retains one copy of the switch upon deletion of one of 
the two copies of the gene. Thus, even in the case that a gene is in a low- 
expression state as in Figure|6](b), the deletion of its high expression partner 
induces its upregulation. Such a back-up feature points to the plausibility 
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of invoking robustness in differentiating between tlie systems yielding the 
two expression levels in Figure [7] (d) , which are otherwise indistinguishable. 
This stable dynamical state is of course, unstable from an evolutionary per- 
spective unless the conditions that lead to the potential loss of a paralog are 
heritable as well. 

An hypothesised original autoregulatory developmental gene has been 
used to replace two copies of the hoxla,h paralogs in a mouse with conse- 
quent minor alterations to normal development of its forebrain[13j. However, 
in the wild- type mouse, the prominent phenotypes that develop upon knock- 
out of a paralog means that this back-up facility that is a consequence of 
duplication has been lost, via the loss of the hoxal autoregulatory site, and 
acquisition of further specialised roles for the proteins. This dynamically 
stable state of the duplicated autoregulator is rendered evolutionary stable 
by the divergence of the pleiotropic roles taken on by the duplicates, which 
are split and stabilised by the model of subfunctionalization [1] |36] . An ex- 
ample of a developmental system where genetic buffering upon deletion of a 
paralog is observed is the myoD and myf5 pair involved in myogenesis where 
deletion of the pair induces the upregulation by its partner [15]. However, 
this appears not to be a cell-autonomous property: results from clonal lines 
in vitro do not show the same effects as in the organism, and further stud- 
ies [U] indicate lineage-specific divergence of gene function, with buffering 
occurring at an inter-cellular stage. 

5. Emergence of oscillations due to changes to activation site 

The standard model of gene activation [32] involves recruitment of Pol 
H and other transcriptional proteins by protein-protein interactions with 
the activation domain of the transcriptional activator. Increasing the ac- 
tivity of the (acidic) activation domain is also accompanied by an increase 
in the protein degradation rate [H], a correlate that is key to the results 
below. Mutations to the activation domain to either copy of the duplicated 
gene/allele can thus introduce a change in the time scales of the dynamics 
of the two proteins. If we view the enhancer region of the gene as the source 
from which proteins are produced, duplication followed by alteration of the 
activation region introduces competition for this source. Such a competitive 
framework arises in ecological theory where one species can take over a food 
source or habitat at the expense of another - a property called competi- 
tive exclusion. However, it has been shown [l9] that it is possible for two 
competing species of predators to subsist on a single species of prey (a food 
source) in an oscillatory mode. This analogy extends to our model as well. 
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For the case where the divergence between copies is modular both en- 
hancers are taken to be copies of each other with tij = 1 and the occupancy 
of the promoters are the same, = f2 = f - In detail, modular activation 
implies rjj = roj, independent of genomic locus i. Thus, we take 

rj = (?'i0,rii,rj2) = (roo,?^oi,r-o2) =: (?'o,n,r2) 

in this section. The fixed points ( easily obtained by setting 

{dldt)(xx,X2) = (0,0) in eq. ^ to find {xl/xD = (C1A2/C2A1), so that 

cMx\, (c2Ai/ciA2)xl) - Aix^ = (12) 

is a cubic equation with either 1 or 3 real solutions. Since we can set a 
rescaled time variable, there are principally 3 parameters that determine 
the different dynamical outcomes of this model. We shall adjust the ratio 
of the decay rates A = A1/A2 to capture the different time-scales for the 
two copies, the ratio c = ci/c2 which includes the number of binding sites 
and can be used to model diploid versions of the duplicated haplotype, and 
r = ri/r2 the relative affinities for the activators to the transcriptional 
machinery. 



5.1. Oscillations by local and global bifurcations 

The stability of a fixed point is usually investigated by the behaviour of 
the vector-field in its vicinity, i.e., by observing how the system responds 



to a perturbation about that point. Using the factorisation eq. (10) of the 
Jacobian of the full system eq. Q we focus our interest on the two-activator 
subnetwork as being the generator of novelty rather than the dosage depen- 
dence downstream. In particular, we seek out the conditions for a Hopf 



bifurcation to find oscillatory solutions to the equations (see Appendix C), 
using Sylvester resultants [50j . 

There, we find two conditions for a Hopf bifurcation, by factoring a 
polynomial constraint into two pieces. First, 



c r2 -ro 



A V 1"! — fQ 

which implies that one of the two genes recruits Pol II more efficiently than 
the basal rate while the other's activation rate is less than that of basal 
transcription - i.e., one is an activator, the other a repressor. This topology 
was proposed in [51] and implemented in [52] and shown on the right in 
Figure |9j We do not deal with this case in this paper. 
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The second factor in the polynomial constraint obtained by the Sylvester 



resultant in Appendix C gives rise to the model that we work with in this 
paper, shown on the left in Figure [9j This factor gives a surface in the 
3-dimensional space determined by ((r2/ri), (c2/ci), (A1/A2)) as shown 
in Figure [S] for chosen values of A2, C2 and tq. This demonstrates that 
a two-activator system can sustain stable oscillations and illustrates how 
topology is an insufficient predictor of network function. As suggested by 
the observation of dual regulation by X2 as a function of xi levels, low- 
expression states participate in a positive feedback loop that maintain a 
stable steady state, whereas for high xi levels X2 behaves like a repressor 
and the oscillatory behaviour of positive and negative loops is observed. 
The subset of the parameter ranges for which the system oscillates in 



Figure 11 (a) where the light grey and dark grey (red online) regions show 
different kinds of stable dynamics. The dark (red online) region displays a 
region where there is only one fixed point with stable oscillations as shown in 



Figure 11 (b) even for a stochastic version of the model (see below). The light 
grey region is where a stable limit cycle and a steady state coexist. To give an 
indication of the biological plausibility of such a mechanism, the biophysical 
parameter of interest is the ratio of binding rates to the transcriptional 
machinery, r = ri/r2. To generate oscillations, a value around r = 50 is 
sufficient for small values of c. This translates into a free-energy of binding 
differential of AAG ~ 2.3 kcal/mol. 

We also point out that there are global bifurcations in this model that 
lead to the emergence of oscillations upon parameter changes, as shown in 



Figure 10 



5.2. Coexistence of one stable equilibrium and one oscillatory state 

The presence of a saddle node separating a stable steady state and an 
unstable fixed point with complex eigenvalues enables the system to be at 
either of these two states. Below the threshold set by the saddle, the system 
settles into the stable steady state; above it, the system oscillates. A saddle- 
node bifurcation arises when the stable fixed point and the saddle merge 
jJSj (the light-dark (grey- red online) boundary in Figure [TT|a) ) , leaving the 
system to exist only its oscillatory state. 

The coexistence of an equilibrium state and a stable limit cycle (the light 



(grey) volume shown in Figure 1 1 ) enables oscillations to be annihilated by 
the suitably chosen perturbation [53j as well as for rhythms to be switched 
back on, but with a reset phase, by a threshold-crossing perturbation around 
the stable fixed point. Such a mechanism underlies a temperature compensa- 
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tion scheme in circadian clocks in flies [5l] ; suggesting plausibility arguments 
for fitness parameters that favour such a phenotype. 



5.3. Noise-driven oscillations and bursts 

The coexistence of stable equilibrium and stable limit cycle dynamics 
in the deterministic model suggests that a stochastic version of the model, 
incorporating intrinsic noise of biochemical reaction steps that constitute 
the network, will show switching behaviour between these phenotypes. The 
stochastic kinetic model (which is a simplified reduction of the detailed set 



of molecular reactions presented in Appendix A. 3 ) contains a "volume pa- 



rameter" which enables the translation from (nano-)molar concentrations 
used in deterministic chemical kinetics models implemented as ODEs and 
numbers of molecules that are present in cellular volumes. We have gener- 
ated sample paths using Gillespie's algorithm [55] for different values of 17 
to observe the extent of noise in the trajectories as well as the frequency of 
switching between these states. We illustrate the occurrence of noise driven 
transitions in Figure [TT|c,d). By modulating the threshold by an external 
input it would also be possible to generate frequency modulated bursts of 
oscillations, an additional mode of dynamical processing made available to 
the cell via such a double-activator subnetwork. 

5.4. Population levels are at steady state even when cells oscillate 

The bursty oscillatory character of the system reflects noise-driven switch- 
ing between a stable steady state and a high mean-expression oscillatory 
state. The random nature of the switching resets the phase and the lack 
of coherence in an unsynchronised population between individual oscilla- 
tors results in very low amplitude oscillations. This averaged behaviour is 
not signiflcantly different from a steady state population response. Further- 
more, univariate histograms for the system that displays stable oscillations 
or oscillatory bursts display bimodal distributions of proteins reminiscent 
of switches. Thus, it is plausible that a population of cells could present 
a responsive interface to environmental inputs, such as driving metabolic 
processing of nutrients in a manner that is neutral, and hides the novel time- 
dependence in individual cells. This might enable anticipatory responses to 
periodic environmental cues by processes further downstream to develop and 
manifest themselves, or indeed be stabilised via duplication events in diploid 



species (see below). The smaller parameter range shown in Figure 11 when 
the steady state is lost and only the oscillatory state remains indicates that 
it is possible for the system to mask the potentially harmful acquisition of 
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oscillatory instability and commit to oscillations under the future favourable 
environmental conditions for its subsequent selection. 



6. Evolution of duplicates for the oscillating phenotype 

The outcome of introducing a duplicate mutant will be to either success- 
fully invade the resident population of singleton alleles or die out. In this 
paper we do not explicitly model the effects of drift, but we note that in 
small populations, selection coefficients much smaller than [l/Nf.), where A'^g 
is the effective population size will get swamped by drift. Moreover, while 
there were two scenarios explored in the dynamics of regulation - a switch- 
like or an oscillatory phenotype ~ we shall only explore the evolutionary 
consequences of oscillatory dynamics and its role in the spread of the dupli- 
cate gene in this paper. Recall, the fitness coefficients in ^ are organised to 
explore the consequences of increased fitness arising from unequal activation 
strengths of the two alleles of a self-activating gene upon the potential for 
retaining gene duplicates. In the singleton case, homozygotes ai/hi or 02/^2 
(note: 02 and 62 are the same allele) have ri = r2 and ci = C2, and hence the 
phenotype is non-oscillating. The allocation of fitness contributions (1 — s) 
and (1 — t) imply that there is positive selection for the heterozygote single- 
ton. While this may arise for a variety of factors, we have identified the onset 
of oscillations in both deterministic and stochastic versions of the regulatory 
network model as the main qualitative difference. Hence, we investigate the 
potential for the duplicate to increase in frequency under conditions that do 
not require additional fitness advantages to doubling c := ci/c2 ([s]), setting 
d = in ([2]). (One can see in Figure 11 that increasing c = ci/c2 pushes the 



system deeper into the oscillatory region, and indeed, can drive a Hopf bi- 
furcation, as seen in Figure [Sj) One question we investigate here is whether 
halving c in ([s]) in the genotypes ai6o/o2&2, 02^0/^1^2 and a2^o/o2^i! which 
has the phenotypic effect of reducing the propensity for oscillations, affects 
the potential for the duplicate gene to spread. For this, we set u = t + u' , 
where n' = is the condition that it matches the fitness of homozygote 
singleton 0262 genotype, and do a perturbative analysis around u = t. 

Since both 2-by-2 recombination matrices in ^ are positive, their largest 
eigenvalue has a unique corresponding eigenvector (called the Perron eigen- 
vector) which has all entries of the same sign (taken positive), by the Perron- 
Frobenius theorem. If the corresponding eigenvalue is greater than 1, the 
linear combination of haplotypes defined by the entries of the Perron eigen- 
vector increases in frequency due to the process of recombination with the 
existing singleton haplotypes in equilibrium. 
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The Riix2i recombination map has a largest eigenpair (A,v) to first 
order in (t — u) and d: 




A-1 

(13) 



This shows the relative evolutionary consequences of dosage balance be- 
tween the alternative alleles oi and 02 by noting the unequal contributions 
of = 2[(p2] and 2[(j)i] = [02] to fitness. If the phenotype of oscillations is 
favoured, we have [cpi] = 2[02]> whereby M^io;2i = W^i0;i2 = W^20;ii = 1 + d 
with d > 0; d > indicates the positive benefits of dosage imbalance and 
thereby increasing the amplitude of oscillations (data on amplitudes not 
shown). If 2[(pi] = [02], c — )■ (c/2) upon duplication, which reduces the like- 
lihood of oscillations (see Figure [s]) . If the value of c were sufficiently high 
preceding duplication (due to the various factors in eq. ([5])), this might yet 
permit the system to retain an oscillatory phenotype. leading to the condi- 



tion t > u. Hence, eq. (13) shows that no additional benefit has to accrue 
to the oscillatory phenotype {d = 0) for the duplicate to propagate if pre- 
duplication allelic divergence is selected for|2 H [T71 [5]. The high-amplitude 
oscillatory state where [(pi] = 2[(/)2] may indeed be weakly selected against 
{d < 0) and yet enable to the spread of the duplicate. Consequently, the pro- 
portion of the duplicated allele aibi is reduced compared to 02^*1 as evident 
from the Perron eigenvector. This reduction is larger for tightly linked genes 
(small p, < p < (1/2)). This suggests different consequences for duplicated 
genes whether they arise by tandem duplication or by retrotransposition. 



7. Discussion 

In this paper, we find that duplication of an autoregulatory gene can lead 
to competition for common genomic binding sites. Assuming a cooperative 
mode of regulation described via, but not requiring, a dimeric mechanism, 
we find that a dual regulatory behaviour - context dependent activation 
or repression - ensues. Feedback amplification of dual regulation gives rise 
to switching and oscillatory behaviour. While a minimal model like the 
one we have chosen involves an autoregulatory loop, we anticipate similar 
qualitative changes to emerge in the more general context of duplication 
of the node closing a larger feedback loop. The model displays mutually 



27 



exclusive expression levels under appropriate contextual cues, analogous to 
those specified by the subfunctionalization model for retention of duplicates. 
When the 2-node double-activator subnetwork is viewed as alleles in a diploid 
species, allelic divergence of expression emerges in this case. We note, in this 
context, that partitioning of expression of different alleles occurs in Fi hybrid 
cotton plants for the alcohol dehydrogenase Adh gene |56] . Moreover, we 
also find that the ensuing circuit is capable of back-up of expression of a 
deleted gene by its paralog kafri-pnas06. This is linked to developmental 
systems where such a phenomenon has been observed. The evolutionary 
instability of this genetic buffering has not been modelled, however, as we 
have not included the effects of mutational loss in our evolutionary analysis. 

We have constructed fitness functions that favours the heterozygote, im- 
plying that coupling to periodicity in the environment enhances the selective 
advantage of an organism bearing such a duplicate. In an in silico evolu- 
tion experiment it has been shown that in a periodic environment increased 
metabolic flux is obtained via glycolytic oscillations when compared to a 
non-oscillating response [57j. Further, it has been experimentally realised 
that hexose transporter genes regulating influx of sugar is crucial for the es- 
tablishment of yeast glycolytic oscillations [58j , and yeast evolves to outcom- 
pete ancestral colonies in nutrient-limited conditions by duplicating hexose 
transporter genes including HXT6/HXT7 [59j. The coupling of oscillatory 
phenotype to environmental cues has been most often discussed in the con- 
text of circadian rhythms |60t |6T] . The results of |22j show that increased 
biomass yields of hybrids [S] - Fi (allopolyploid) crosses of two Arabidopsis 
species - was correlated with increases in amplitude of circadian rhythms of 
clock and clock-controlled genes. While Fi hybrids are often evolutionarily 
unstable, our simple evolutionary model shows the spread of a duplicate 
in such a system with over dominance. Circadian rhythm generating mecha- 
nisms have been much studied, and it has been shown in mammalian systems 
that competitive binding to common conserved cis-regulatory regions (such 
as the E/E'-box, the RRE and the D-box in mammals [62j) drives circadian 
rhythms, and that its removal reduces the amplitude of oscillations. While 
this appears to be consonant with our findings, it should be pointed out com- 
petitive binding is the sole mechanism in our model, unlike in the circadian 
clock. The competitive mechanism for the generation of oscillations relies 
on the increased efficacy of activation of one of the activators feeds greater 
proteolytic rates [12163]. The opposite conclusion to "competition aids os- 
cillations" is drawn in numerical analysis of models of synthetic oscillators 

Our model displays the role of copy numbers of alleles in increasing the 



28 



amplitude of oscillations (via increasing c). Consequences of gene duplica- 
tion have been considered in the context of dosage balance. The ubiquity of 
dominance has given rise to the hypothesis that dosage balance is favoured 
[8]. On the other hand, in the case of the oscillatory dynamical states 
discussed, our assumptions favoured a heterozygote - that under some en- 
vironmental (typically periodic) conditions there is selective advantage to 
oscillatory behaviour - leading to the increase in frequency of gene dupli- 
cates with genotypes with an even larger amplitude for oscillations. On 
a large-scale analysis, it has been noted that there is an increased frac- 
tion [25] of duplicated yeast genes as a result of whole genome duplication 
amongst those that cycle during metabolic oscillations [SS] , and [IS] identify 
the extensive participation of paralogs in multiple rhythmic processes as a 
partitioning of the oscillatory feature amongst duplicates. We leave for a fu- 
ture investigation the theoretical analysis of duplications in such regulatory 
model systems. 

Appendix A. Model construction 

In this Appendix we describe, first, the thermodynamic model of tran- 
scription rates, then the detailed kinetic model that facilitates the intro- 
duction of the stochastic model used for performing the simulations. We 
also introduce the context dependent model which takes the influence of 
cis-regulatory site information in the rate of transcription. 

Appendix A.l. Thermodynamic model of promoter occupancy 

Transcription factors bound to the enhancer recruit RNA polymerase 
and the thermodynamic formalism can easily accommodate different bind- 
ing free energies of protein-DNA and protein-protein interactions in a uni- 
form manner. In this paper, we summarize the composite effects of tran- 
scription factor-Mediator and Mediator-RNA polymerase binding by energy 
terms e^^p for each activator Ai. Hereafter, we denote by P the transcrip- 
tional apparatus involving general transcription factors, the Mediator com- 
plex and RNA Pol II. In the standard way to count different possible con- 
figurations (see [33] for a pedagogical introduction in the context of gene 
regulation) we introduce Boltzmann factors for all possible configurations 
for binding of activators Ai^2 and P to calculate the partition function 
ZtotiP, Ai, A2)- A subset of these configurations are poised for transcrip- 
tion i.e., those with RNA polymerase or Pol II bound to the promoter. The 
rate of mRNA synthesis is taken to be proportional to the probability of 
the Pol II bound promoter, which is taken to be the ratio of the Boltzmann 
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factors for the favourable configurations with promoter-specific bound Pol 
II to Ztot{P,Ai,A2). 

The binding energies for non-specific (site-specific) binding are denoted 
e^{e^), with appropriate subscripts which identify the binding of activators 
or the polymerases to the DNA. For the case where k polymerases, / activa- 
tors of type 1 and m activators of type 2 bind to the czs-regulatory region 
of the DNA their energy contribution is 

E%k,l,m) := fc(e^^ + Ze^ip + "T-^Aap + ^^"£^1712?)+ ^^^^^ 

where the subscripts indicate the protein-protein binding energies as well, 
including eAiA2P which captures the net energy of recruitment of transcrip- 
tional machinery due to the combined action of the activators. If the proteins 
bind to non-cognate sites, the binding energy is 

E'^ik, /, m) := feO^ + 18%^ + me%. (A.2) 

For P the number of Pol II molecules, Ai^2 the number of transcription 
activators of each type, we introduce 

CiP,A„A2) = e-f^^''i^'^-^^\ (A.3) 

where the right hand side includes in the exponent (3 = 1/(A;bT), and the 
trinomial coefficient contains the number of binding sites in the genome Nns- 
We shall simplify the partition function for or 1 molecules of type P, Ai^2 
bound to the relevant DNA sites 

Ztot{P,Ai,A2)= Yl aP-k,Ai-l,A2-m)e-^^'^''''^^ (A.4) 

{A:,i,m)G{0,l}^ 



using the (Stirling) approximation 

Nns \ . . N^s^y^^ 



X,Y,ZJ X\Y\Z\ 



if iV„s »X,y,Z, (A.5) 



and the following definitions (A.6), 



= -^e '^^^^^'^ ri = e"^^^»f, i = l,2 

p = g-P'^^ld~'^vd\ ri2 = e~^'^^i'*2P, UJ12 = e'^^-^i-^a 

Nns 



(A.6) 
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We obtain 

Ztot{P,Ai,A2) 



l+p+(l+pri)ai + (l+pr2)a2 + (l+pri2)u;i2Q;iQ:2, (A. 7) 



which enables us to compute the probability of occupancy of the promoter 
by RNA Pol II: 

C{P -l,Ai-l,A2-m)e-^'^^^^^^^"'\ (A.8) 



Ztot{P,A^,A2) 

((,m)e|0,l|^ 

Finally, we can express the transcription rate as proportional to probability 
of promoter occupancy by P: 

^ _ p (1 + nai + r2a2 + ri2a;i2Qi«2) ,^ gx 

(1 + ai + 02 + uJi2aia2) + p (1 + riai + r2a2 + ri2Wi2aia2) ' 

Phenomenologically, such an expression is matched to experimental data on 
amplification or reduction of mRNA production as a function of transcrip- 
tion factor numbers, called the fold change function, ip: 

p(l + nai + r2a2 + ri2t^i2aia2) /aia\ 
V'(ai,a2) = 7— ^ ^ • (A.IO) 

(1 + ai + 02 + uJi2aia2) 

For the case of competitive binding of Ai and A2 to the same promoter site, 
we shall set W12 = hereafter. 

Appendix A. 2. Context-dependent recruitment 

In order to introduce cis-contcxt dependent regulation beyond that cap- 
tured by Ai^2 binding sites, we will need to introduce additional binding 
sites. We introduce two additional factors A^ and A^ but we assume that 
it only alters the transcriptional ability of Ai^2 by site-specific and protein- 
protein interactions. We proceed in exactly the same lines as before to end 
up with 

ip{ai,a2,a3,a4) = (A.ll) 

(1 + Ei=i + Efe=3 '^ikOiiOik) 

and the promoter occupancy is, as before, 
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In what follows we shall restrict ourselves to the situation where ^3 and 
A4 does not initiate transcription on its own but merely acts as helpers to 
Ai^2- Thus we can set ra = = r4 and end up with a context dependent 
fold-change factor 

,(c)/ X /0(1 + ai(n +n3'^13a3 + n4Wl4a4) + a2(?'2 + ?'23W23a3 + +'^24W24a4)) 

(1 + ai(l + ujrsois + ^140^4) + ^2(1 + ^23<^s + ^24^4)) 

(A.12) 

which enables a greater flexibility in setting differential rates of recruit- 
ment based on both protein-protein interactions Wjj, sequence-dependent 
transcription factor-DNA binding (oj) as well as recruitment of the tran- 
scriptional machinery (rj, rij). 

Appendix A. 3. Kinetic model and reduction 

In order to perform stochastic simulations it is convenient to also in- 
troduce a detailed kinetic scheme which reduces to the result derived from 
thermodynamics under detailed balance of the fast reactions to be indicated 
below. As in the thermodynamic description, we shall describe the details of 
the two activator case and extend the formalism to include the third auxil- 
iary transcription factor as that does not play a dynamical role in the paper, 
but merely motivates the parameterization. 

For this section it is convenient to introduce a and h as labels that refer 
to the two genes, and A and B to denote the two proteins that are expresed. 
We shall also refer to the quantities of A and i?-monomers by [A] and [B] 
and their dimeric forms by [A2] and [B2] respectively. Later on we shall also 
introduce the variables x and y which are scaled versions of [A] and [B] . To 
match the (xi,X2) variables in the paper, we note that x = xi and y = X2- 



To connect to the thermodynamic description in A. 9, qi^2 will be taken to 
be dimers [A2], [B2] and will be represented by x'^ and in the Appendix 
, and to repeat, is called xi^2 in the main paper. 

The set of reactions that we consider are given in the table below. Note 
that the reactions below the horizontal line in the table are excluded, and are 
presented only to indicate which reactions occur with vanishing probability 
because of the steric inhibition of the transcription factors. In order to 
relate the kinetic description under detailed balance to the thermodynamic 
description, we set 

UJ12 = (A.13) 



in A. 9 to impose the mutual exclusivity of binding to the enhancers. 



32 



Reactions 



Rates 



2A ^ A2 

A2 + DA ^ DAA2 
A2 + DB ^ DBA2 
B2 + DA ^ DAB2 
B2 + DB ^ DBB2 
DA + Pol ^ DAPol 
DB + Pol T± DBPol 
DAA2 + Pol ^ CAA2 
DAB2 + Pol ^ CAB2 
DBA2 + Pol ^ CBA2 
DBB2 + Pol ^ CBB2 
DAPol ^ DA + m„ + Pol 
DB + mfi + Pol 
DAA2 + ma + Pol 
DAB2 + ma + Pol 
DBA2 + rrif, + Pol 
DBB2 + mb + Pol 
DA + ma + Pol 
DB + mb + Pol 
A + nia 



i^dim,B 

(.^db,A 
i^da,B 
(.^dh,B 
(PaO 

iPto 

(ptA 

(pIb 

(ptA 
(ptB 



Propensities 



(^dim,yl^i/^' ^dirn,A"'A2) 
i^dim,B''^B/^^ ^dim,B)"'S2 

{kdb,A^^2^DB/^j k^l)^j\)'^DBA2 
{^^da,B'^DAnB2/^, k^a,B'^OAB2) 
i^db B^B>b'^B2/^i k^l) B^DBB2) 
{PaO^DAn-Pol/^, PaO^DAPol) 
{P^Qf^DBlT-Pol/O,, Pf^rioBPol) 
{P^A^B>AA2'npol/^, PaA^CAA2) 
{PaB^B>AB2''T'Pol/^j PaB^CAB2) 
{PbA^DBA2npol/^, PbA^c;BA2 ) 

{Pb^B^oAnpoi/^, Pbj^noAPoi) 

Pbfl 
Pa,A 
Pa,B 
Pb,A 
Pb,B 
Pa,Q 
P'bfi 
TTa 

Smb 



DBPol 
CAA2 
CAB2 
CBA2 
CBB2 
DAPol 
DBPol 
nio 
nia 

rrih — )■ i? + rrih 
mb -)• 



dim, a) 
,u > 
'dim,B) 



^da,A) 

Kib,A) 

Kia,B) 
hu \ 
'^db,B) 

Pap) 

Pbo) 

Poa) 
Pob) 

PbA) 
PbB) 



P-afl 
Pbfl 
Pa,A 
Pa,B 
Pb,A 
Pb,B 
Pafi 
P'bfi 
TTa 

Smb 

Aa 

Ab 



B2 + DAA2 
A2 + DBB2 



DAX 
DBX 



(Ub uu ^ 

(hb uu \ 
\'^db,X->'^db,X) 



(0,oo) 
(0,oo) 



(0,oo) 
(0,oo) 



Table A. 3: The molecular species/states represented in the reaction scheme are labelled 
by the genetic identities, A and B. DA, DB stand for promoter regions upstream of 
genes a, b. mRNA and proteins are ma. A, etc, while Pol is a shorthand for the set of 
intermediates including Mediator and the RNA Pol II transcriptional machinery. Of key 
significance in this paper is the different affinities of the transcription factors A2, etc 
bound to promoters DAA2, etc have for this transcribing machinery. The states DAA2 
and DAB2 refer to the states of the promoter of gene a bound by A2, B2 respectively. 
The set of reactions below the horizontal line involving states DAX, CAX, DBX, CBX are 
those with promoters bound by both transcription factors. Those states are excluded in 
the XOR case, k^a,x = = k^b^x- We have also set the basal rates of polymerase binding 
to the two promoters the same, pj. The rightmost column gives the propensities for the 
reactions used in the Gillespie simulation. The probabilities for the stochastic case are 
obtained by dividing the rate constants by^^^H where Na is Avogadro's number and f2 
a volume. Under this normalisation, a InM concentration corresponds to approximately 
1 molecule in E. coli and 60 molecules in a mammalian cell nucleus. 



In the reactions in Table A. 3, we make the assumption that all the 
binding-unbinding events in the cis-regulatory regions are much faster com- 
pared to the slow processes of transcript formation and translation. Further, 
we assume detailed balance to arrive at the same fractions for the states cor- 
responding to the bound configurations CAA2, CAB2, CBA2, CBB2 (and 



CA, CB for basal transcription) as in the thermodynamical formulation A. 9 
The rates of transcription of the mRNA species of A and B are: 

4 [^a] = Ma,0 [CA] + lla,A [C AA2] + ^la.B [C AB2] - 5a [nia] 

^ (A.14) 

^ [mb] = fimw [CB] + ^,h,A [CBA2] + fi.^B [CBB2] - 4 [mt] 

where the promoters [DA], [DB] of genes A and B are bound by the RNA 
Pol II and initiate transcript elongation at the different rates ii. We shall 
make the assumption in this paper that transcription elongation takes place 
at a rate that is independendent of promoter configuration and so we shall 

set 

^J'a,0 = fJ-a,A = l^a,B = /^a and ^fc^Q = /^6,A = ^^b,B = l^b- 

In order to match up with the thermodynamic formalism we need to assume 
detailed balance for all the reactions where i> binds to/unbinds from <i to 
form ex] in Table lAlSl 

k+ [>][<] = k-[^] ^ H = ^MM, (A. 15) 

which prompts us to introduce: 

dim, A dim,B /a r\ 

'^dim,A ^dim,B 

for the dimer dissociation constants and 

1,11 i,u uu uu 

'^da,A J. '^db,A '^da,B '^db,B / a i 

^da,A = -Tb ' ^db,A = TJ^, Kda,B = p , Kdb,B = p (A.17) 

'^da,A '^db,A '^da,B '^db,B 

for protein-DNA binding, where the Kdo,» notation stands for • binding to 
promoter of gene o. 

The GRF involves terms of the form {[TF]/Ktf) x r where r incorporates 
the binding probabilities of the transcription factors to the transcriptional 
machinery (such as Mediator, RNA polymerases, etc) and Krpp is the disso- 
ciation constant for protein-DNA interaction (involves exponentials of free 
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energy of binding). The TFs of interest in this model are dimers A2 of 
protein A and from 



A + A 



A2 



'^dim,A 
h.b 

'^dim,A 



[A2] 



. Using the detailed balance rapid equilibrium hypothesis, the dynamical 
variables enter with products of the dissociation constants for protein-DNA 
and dimerisation dissociation constants. 
We introduce 



[Pol]^, r,A := [Polf'^ 



[Pol]^ (A. 18) 

PiO PiA PiB 

where i could take the index values a or 6 and would reflect the promoter 
dependence of the (basal) dissociation rates in question. We further define 
= ^dim^daA and k| = K^.^Kdb,B so that 

X = [A]/ka and y = [B]/kb, 

and the ratios tj^^ = (Kdij/Kdjj), with tu = 1. Thus, t^^ = {Kda,B / Kdb,B), 
Ka ~ {^db,A/ Kda,A) measure the relative strengths of binding of the tran- 
scription factors to the promoter regions of the 2 genes using their autoreg- 
ulatory binding as reference. Now we substitute for the bound promoter 
concentrations in eq. (A. 14) using the relations derived from detailed bal- 
ance to obtain 



nin 



-6a[ma]+f^a[DA][Fol] 



PaO 



+ 



PaA 



PaA ^dim^da,A 



+ 



pZb Kdb,B [BY 



dt 



[rrib] 



PaB ^da,B Kg^Kdb,B . 
-5a [ma] + Pa\B)A\ (raO + TaAX^ + TaBtaBV^) 

-bb\mb\^iib\DB\^o\\ 

PlA'^daA W 



(A.19) 



+ 



PbB 



{BY 



PbA ^db,A K^i^Kda,A PhB K^^Kdh^B . 

= -5fe[mb] + iib{B)B\ (rfeo + nAhA'3?' + nBV^) ■ 

To obtain the expressions for the promoter fractions [DA] and [DB], note 
that the total available promoters are either unoccupied, or occupied by 
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transcription factors and core transcriptional machinery. Thus, the total 
promoter availability is used to obtain the bound fractions, in line with the 
thermodynamic description above. These are obtained as follows: 

[DA]o = [DA] + [CA] + [DAA2] + [CAA2] + [DAB2] + [CAB2] 



[^^] PaO PaA Kfim^daA 

+(l + ^[Pol])-^^^'^ t^]' 



PaB ^da,B Kg^Kdb,B 

= (1 + Tao) + (1 + raA)x^ + (1 + VaB^BV^ = r„ 

[DB]q = [DB] + [CB] + [DBA2] + [CBA2] + [DBB2] + [CBB2] 
[DB]o _ (i + ^[Poi]) + (i + ^[Pol])-^^-^ t^]' 



+(l + ^[Pol]). t^]' 



PbB Kg^Kdb,B 

= (1 + rfeo) + (1 + nA)tbAX^ + (1 + rbB)y'^ = Tb 

(A.20) 

We now write the set of equations determining the kinetics of transcrip- 
tion and translation: 

d , . VaO + raAX^ + raBtB^y^ 1 

dt [l + Tao) + [l + raAjX + [l + TaBjtaBV 

-[A] = TTaK] -Aa[A], 

r 1 rnoi no + nAtbAX^ + nEV^ .r i 

-fMb] = f^b[DB]o---- 2 , n I rT-<^&K]' 

dt (1 + rfeo) + (1 + TbAjtbAX + (1 + rbfi)?/ 

^[5] = 7Tb[mb] - Ab[B]. 

(A.21) 

We then introduce the assumption that the fast decay times of mRNA 
compared to those of proteins enables the translation machinery to effec- 
tively see a steady state level of mRNA [mj]** 

[m.iY'^ = -^mRNA production rate(i), for i = a,b (A. 22) 
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which we substitute in the differential equations for proteins to arrive at 



^ rn/11 TaO + raAX^ +raBtaBy'^ . 

—X = \DA\q- ^ K — A^x 

dt 5aHA (1 + Tao) + (1 + raA)x^ + (1 + TaB^BV 

= Ca(pa{x,y) - AaX, 

— = \DB] ^^^^ ^feo + nAtbAx'^ + rbBv"^ ^ 

dt^ ~ ^'^ 5bKB (1 + no) + {1 + nA)tbAX^ + {1 + nB)y'^ 
= Cbipb{x,y) - AbV, 

(A.23) 

where Ca = and Cb = 

To summarise the correspondence to the notation in the main text, we 
have(xi,X2) o {x,y), itaA,taB,tbA,tbB) ^ (tii,ti2,t2i,t22) and (raA, ^aB, r^^, ^55) ^ 
('"11 5 ''12 5 '"215 '''22), and similarly, the a and b subscripts correspond to 1 and 
2 in the main text. 

Appendix A. 4- Introducing cis-regulatory context and subfunctionalizing mu- 
tations 

This section deals with the case of two transcriptional activators Ai 
and A2 and two helper proteins, Ci and A2- These helper proteins facil- 
itate discussions on cis-regulatory context and parameterize such context- 
dependence. For simplicity, we shall restrict ourselves to the situation where 
Ck does not activate either of the duplicate genes by itself, but modifies 
the recruitment potential of Ai and A2 via protein-protein interactions and 
DNA binding affinity at the regulatory site, incorporating the roles of cis- 



and irans-effects in evolution. Using the fold change function (A. 12) as 
reference, we define 

2 2 
Tjo + {rji + ^rjiCf,tjCkXcJtjixl + (rj2 + X] ^i2CfeiiCfea;cJij2 

(-1,-2) = 

l + il + Y. hCk^cJtjixl + (1 + ^ tjc,xcjtj2xl 
k=l k=l 

(A.24) 

which incorporates the protein-protein interactions between transcription 
factors Ai^2 and helper proteins Ci^2 that are significant for recruitment 
of the transcription machinery: rjic^ stands for the affinity of the Ai-Ck 
protein complex on the DNA regulatory region of gene j to the transcription 
machinery (Mediator, Pol II, etc), tj^^ = K^j^Ck measures the protein-DNA 



X2 
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dissociation constant of Ck to the enhancer of j. We introduce 




Tjk + rjkCitjCiXC:, + rjkC2tjC2XC2 

1+tjCiXCi +tjC2XC2 



) = tjk{l + tjc^xci + tjc^xci) and 



rjk 



(A.25) 



which simplifies the expression for the fold-change to 



ll^j '{Xi,X2) = 



1 + ijixj + ij2xl 



(A.26) 



To evaluate the nature of the regulatory activity of xi and X2 in the 
presence of C we compute the partial derivatives dx^. $i of the probability of 
occupancy of the promoter of i: 



This can be split into two factors: 



for (i, k = 1,2), where the second factor determines the sign of the regulatory 
activity. The dependence of f on protein-protein interactions Vijc^. and 
DNA binding strengths Ucf, illustrates how context-dependent changes in 
the nature of regulation can be achieved. 

In particular, we demonstrate how complementary loss of function mu- 
tations can yield the parameters for the exclusive switching circuit. If all 
protein-protein interactions are modular, in that they occur due to con- 
tact interactions, wc can set Vij — Voj and '^ijCk — '^^jCk • (^Vc denote 
the locus independence, or the lack of an index, as o.) We assume that 
^oiCi = ''02C2 = ■''C and ro2Ci = i^oiC2 = ^fc, for e < 1. Mutations are as- 
sumed to leave the protein-protein interactions unaffected for the czs-context 
cases. To implement the subfunctionalization model we set tic^ = ^202 = ^c-, 
and t\c2 — ^2Ci = eic; thus, two loss of binding site mutations for the helper 
proteins Ci^2 are assumed to take place. These interaction strengths are thus 
of the same order of magnitude as the binding to non-specific sites and can 
thus be absorbed into rjo- Further, we assume both helpers to be of the 
same concentration, to simplify description and analysis: xc^ « xc2 = xc- 



2 



dxk ((1 + + taxi + ii2X2)f 



[hk - no) -h ^(1 - 5kj)ii3x]{fik - hj) 
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Reactions 



Rates 



Propensities 



A2 + DZ ^ DZA2 
B2 + DZ ^ DZB2 
DZ + Pol ^ DZPol 
DZA2 + Pol ^ CZA2 
DZB2 + Pol ^ CZB2 
DZPol ^ DZ + + Pol 
CZA2 DZA2 + m^ + Pol 
CZB2 DZB2 + + Pol 
DZPol ^ DZ + + Pol 
ruz — )• Z + ruz 
niz ^% 



b I.U \ 
dz,B-> '^dz,B) 

iPzQi Pzo) 

(ptA^Pz_A) 
(P^B^Pzb) 

Pz,0 

Pz,A 

pz,B 

Pz,0 
T^z 

^rnz 



{k'dz,A''^A2nDZ l^l ^^Z,A™OZA2) 
ikdz,B''^DZnB2/^, k2z^B''^DZB2) 
{PzO^Dz'^Pol/^l PzQ^DZPol) 

{PzA^DZA2^Poi/^j PzA^c:zA2) 

{PzB^OZB2^Pol/^7 PzB^CZB2) 
Pzfl 
Pz,A 
Pz,B 
Pz,0 
T^z 



Table A. 4: The reactions to be added to the Table 1 to incorporate the action of the 
transcription factors A2, B2 on the target gene z. The notation is analogous to Table 1 
as well. 



Making these substitutions into eq. ( A.25| ) we get 



iij ~ =tij{'i^ + tcxc) 

rok + rokCjtcxc (A. 27) 



1 + tcxc 



The strengths rokCj determine, for the heterozygous switches considered in 
the paper, how the complementary strengths of protein-protein interactions 
for recruitment of polymerases are achieved by setting roiCi > ^^0^02 

Appendix A. 5. Duplicated auto-activator on target gene 

Here we consider the case where an activator gene a activates itself, up- 
regulating the production of protein A and turns on a gene z which expresses 
a protein Z. As explained in the text, this is the typical motif that figures in 
selector genes or terminal selector genes [TT]. We examine the consequences 
of duplicating a so that now we have two copies ai, 02 which are mutually 
and self- activating and also inherit a common target site in z. 

If we impose a similar detailed balance condition to extract the kinetic 
equations from the above reaction scheme, we end up with the following 
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scheme: 

^ = ^T^ 7T - (A-28) 

where x = (xi, X2), and (xi, X2, X3) = ([j4i]/«;i, [j42]/«:2, [Z]), where Ki is the 
geometric mean of two dissociation constants Kd: the dissociation constant 
of dimerization of protein labelled by i and the protein-DNA dissociation 
constant of protein i and enhancer region of gene i;ri = (rjo, Tji, rj2) pa- 
rameterizes the recruiting affinity of transcription factor Ai^2 for the tran- 
scriptional machinery at genomic locus i; tj = {tii,ti2) measures the relative 
strengths of the affinities of the transcription factors to enhancers of i: we 
set tij = {Kdi,Aj/Kdi,Ai), ti2 = {Kdi^A2/Kd2,A2), for i = 1,2,3, so that 
til = 1 = t22- The promoter occupancy probability (/?j(x, rj, tj) is defined in 
terms of the fold change function V'i(x, rj,tj) thus: 



(Pi{x,ri,ti) 
where 'i/'j(x, rj, tj) 



1 



rio + rutiixf + ri2ti2x'^ 



(A.29) 



1 + tiixj + ti2xl 



Note, for simplicity we have used r and t instead oi f, t that we introduced 
to explicitly demonstrate the source of context dependence in the previous 
subsection. 



Appendix B. Parameter values for numerical experiments 

The following parameters were used in the computations. Units of con- 
centration in nM and time in hr. r^o = ^"60 = 1/1000, (5a = 1 = 5b, 
Ab = 1/10, Aa = A X Ab, fJ-b = 2, Ha = fJ-bVc, n = 30, iTa = T^b\r(^-, 
KA = i^B = SvTo K(ia,A = 1/2 = Kdb,B, DAq = 1 = DBq. rij is taken to be 
10 X ro X r where the factor 1 < r < 100 is the ratio rij/rn. which is taken as 
a variable, as are c (1 < c < 10) and A (1 < A < 25). For the switches, both 
heterozygous and homozygous, the ratio A of degradation rates is taken to 
be 1 and so is c for the pre-duplication genotype. 

Appendix C. Hopf bifurcation surface via elimination 

The fixed points of eq. Q, {x'[,X2) = (C1A2/C2A1) = (c/A) which are 
solutions to Ci(p — AjXj = 0. Thus C2f{x2, {c/ A)x2) — A2X2 = is a cubic 
represented as fi below. The eigenvalues of the Jacobian of the (xi, X2) sys- 
tem are evaluated at (x^, X2) to analyse stability; while negative eigenvalues 
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imply local stability, as above, analysis of instability for oscillatory solutions 
involves looking at complex eigenvalues A = a it of the Jacobian of the 
dynamical system at its fixed points. At the fixed point, when the real part 
o goes from being negative (stable) to positive (unstable) as a continuous 
function of a parameter in the system, while |6| > the system is set to 
undergo a Hopf bifurcation and begins oscillating with a period 27r/|6|. For 
a 2-dimensional system, the sum of the eigenvalues at the crossover value 
a = vanishes, making the trace of the Jacobian matrix 0, while the deter- 
minant remains positive in our case. The equations for the tracelessness of 
the Jacobian at the point xi = {c/A)x2- 

- 1 / ^ _ A A ^ 



where the partial derivatives are listed in eq. (|11[ reproduced below: 

02 



^^(a,t = l,r) - ^''-''^ 



d/da2 J ' ' ' ' (l + ai + a2)2 



/ n - rp 

in - r2) 
r2 - rp 

V in - r2) 



Ql 



(C.l) 



Thus the condition for a Hopf bifurcation reduces to solutions of 

Ci- h C2^ — = Ai + A2. 

0x1 0x2 

Using the simplification tij = 1, and upon substituting xi = (c/A)x2, this 
turns out to be a quartic equation /2 in X2 with coefficients being compli- 
cated, but polynomial combinations of the parameters. We compute the 
resultant of /i , /2 to find a non-trivial greatest common divisor (gcd) and 
common root of these polynomials. The resultant is computed via the de- 
terminant of the Sylvester matrix of the polynomials fi, f2- 

3 4 
fi = ^Sixl and /2 = ^ 

i=0 i=0 

So = A^C2ro, 

si = -A2(l + ro)A2, 

52 = (c^r + A^) C2r2, 

53 = - (c^ + A^ + c'^rr2 + A'^r2) A2, f^^-^) 
ho = A4(l + A)(l + ro)2A2, 

hi = 2A^C2 (ro + c^ro - r2 - c^rr2) , 

h2 = -2s3A2(l + A)(l + ro), 

hs = -2c2(r-l)A(A2-l)c2r2, 

/i4 = -53(1 + A) 
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given by 



/ S3 











/l4 








\ 


S2 


S3 








hs 


/l4 







Si 


S2 


S3 





h2 


h3 






So 


Si 


S2 


S3 


hi 


h2 


hs 







So 


Si 


S2 


ho 


hi 


h2 










So 


Si 





ho 


hi 




^ 








So 








ho 


/ 



This resultant factors into two pieces that change sign, and hence contains 
a zero. The first is 

(c2(ri -ro) + A2(r2 -ro))^ 

which vanishes for 

_c _ / r2 - ro 
A V ri-ro' 

This condition implies that one of the two genes recruits Pol II more effi- 
ciently than the basal rate while the other's activation rate is less than that 
of basal transcription - i.e., one is an activator, the other a repressor. 

The second factor is a complicated function h{r, A, c) and we omit it 
here. However, we can plot the surface h{r, A, c) = 0, as shown in Figure [sj 
This is the case that we examine in depth in the paper. 
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Figure 6: The (heterozygous) context dependent dual-activator switches be- 
tween three different expression states. The Xi-X2 planes in this figure are in log- 
arithmic scale. For the case rn — r22 ~ r^, r2i = ri2 = r'2, we obtain multiple steady 
states. Here we illustrate the transition between three and two steady states as the ratio 
r = ri/r2 changes from a value of 10 in (a.l) and (a. 2) to 20 in (b.l). All other parameter 
values are as in [Appendix B[ with c — A = 1, and the activation strengths are multiples, 
r of ro. The two histograms in (a.l) and (a.2) correspond to different initial conditions 
on either side of the arrow drawn in the phase-plane (a) for small noise (large f2). The 
vertical lines in (a) and (b) correspond to the case of the single gene switch. The outward 
point double-arrow in (a) shows the threshold for the single gene case (which is close to 
that of the duplicated gene as well), while the inward pointing single arrow locates the 
only (stable) fixed point for large r in the single gene case. On the right are shown the 
stable and unstable fixed points of xi,X2 (ordinate) for different values of r (abscissa). 
Because of symmetry xi and X2 share the same steady state values, but not concurrently. 
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(a) 



Oo 



(b) 
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(d) 



Figure 7: (Expression levels vs r). The decision switch is indistinguishable from 
the single gene switch in downstream effects. In each of these figures, the abscissa 
indicates the value of r in the two-gene subnetwork. In (a) we have the case of unchanged 
duplicated autoregulatory gene nj = r, while in (b) we have the case rn = r22 = r while 

ri2 — lOro — r2i where the basal rates arc taken to be the same r^o — Tq. The open 
circles in the top two figures (a) and (b) are the values of the target gene z corresponding 
to the unstable fixed points. The stable fixed point values are in filled circles. The stable 
values of z in (a) and (b) are gathered together in (c) for comparison, with the open 
circles correspond to values in (b) , filled circles to those in (a) . Finally, in (d) we compare 
the stable values from the context-dependent case (b) to those from the target expression 
of the pre-duplicated system. Note that if we use the expression of the target gene as 
a readout of the effect of gene duplication, we find no change (see (d)) prior to further 
divergence. For the decision switch, (c) shows that the only qualitative change lies in the 
threshold for switching and hysteresis. 
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(a) 



(b) 



Figure 8: Parameter space where Hopf bifurcation occurs. As explained in tlie 
text, eliminating X2 from /i and /2 for a set of parameter values C2 = 50, ro — 10~^ and 
r2 = 10~^ yields a surface in (r, A, c) 3-dimensional parameter space depicted in (a). The 

In (b) we show the complex plane of the 



rest of the parameters are as in Appendix B 



eigenvalues with two of the three parameters fixed, and the increase of the third through 
the bifurcation point (s). This occurs as c, r increases, and the non-monotonic increase of 
the positive real part of the eigenvalue as A increases is also clearly seen at the bottom 
of (a). 



Figure 9: The two topologies obtained by solving for the Hopf bifurcation condition. The 
one on the left is the subnetwork we address in this paper. 
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Figure 10: Global bifurcations leading to oscillations are exemplified in this figure, 
where an excitatory trajectory can be trapped into an osciUatory state by realignment 
of nuUcfines upon parameter changes. This can happen for both, changes of r and of c: 
the middle panel has parameter values 9* = {r = 80, c = 3.5, A = 12.4), whereas the left 
panel has 9 + Sc and the right panel has 9 + Sr, with Sc — 0.1 and Sr — 2. Either change 
modifies the xi nuUcline from )(— >■ ^. Time series plots for each case is shown above the 
phase planes. 
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Figure 11: Bursts of oscillations are controlled by the amount of noise in expres- 
sion. The X1-X2 planes in this figure are in logarithmic scale, (a): Oscillatory regimes in 
parameter space - red (dark online) region has a stable limit cycle; the light (grey online) 
has a coexisting steady state as well. (b):Histograms accumulated from a simulation of the 
stochastic kinetics in the red oscillatory phase with {xi, X2, P(xi, X2)) as the right-handed 
coordinate axes. Also shown are histograms for xi (unimodal) and X2 (bimodal). (c): 
Phase plane with Xi —x^ nuUclines and the trajectories of two different initial points, cho- 
sen to be on either side of the separatrix determined by the saddle-node (in the middle of 
the three intersections), (d): Histograms of trajectories in the light grey region of (a) for 
different noise strengths (which decreases from top to bottom) showing the redistribution 
of probability mass around the limit cycle away from low-expression peak. 
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